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Abstract 
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p^i _ New iterative methods for solving linear equations are presented that are easy 

to use, generalize good existing methods, and appear to be faster. The new algo- 

f-^ | rithms mix two kinds of linear recurrence formulas. Older methods have either high 

£\j ■ order recurrence formulas with scalars for coefficients, as in truncated orthomin, or 

have 1st order recurrence formulas with matrix polynomials for coefficients, as in 
restarted gcr/gmres. The new methods include both: high order recurrence for- 
mulas and matrix polynomials for coefficients. These methods provide a trade-off 

I^S ' between recurrence order and polynomial degree that can be exploited to achieve 

greater efficiency. Convergence results are obtained for both constant coefficient 
and varying coefficient methods. 
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1. Introduction 

This paper develops a new class of iterative methods for solving linear equations. 
The new algorithms are easy to use, they generalize good existing methods, and 
they appear to be faster. 

The new class of algorithms combines the essential features of several methods, 
in the following way. When many known algorithms are defined very simply, their 
sequences of approximate solutions can be seen to satisfy recurrence formulas of 
two types. Either the recurrence formulas have high orders and scalar coefficients, 
as in truncated orthomin, or the recurrence formulas have order 1 and polynomials 
of matrices for coefficients, as in restarted gcr/gmres. The new methods are the 
generalization to formulas that mix the two types of recurrences. 

Methods in the new class build solutions from linear combinations of vectors 
using operators for coefficients, that is, using polynomials of matrices. The solu- 
tion sequences therefore are vector linear recurrences whose coefficients are linear 
transformations, whence the name operator coefficient methods. The choice of co- 
efficients leads to many, mostly unexplored variations. 

A convenient, and in older methods, a frequent choice of coefficients repetitively 
solves a simple minimization problem. A new convergence result proves this choice 
of coefficients often results in convergence and establishes upper bounds on the 
convergence rates. An examination of the parameters that govern the convergence 
rates and some numerical experiments suggest that the new methods are faster than 
those currently in use. Moreover, it is suggested that both old and new algorithms 
may be better implemented by means of well-established, least-squares procedures. 

These are the paper's major results. First, a simple characterization of iter- 
ative methods is proposed that unifies many algorithms. Similar descriptions are 
known for some algorithms, but they have not been systematically applied to others. 
Second, a spectrum of new iterative methods is found to lie between those of high 
recurrence order such as truncated orthomin, and those of high polynomial degree 
such as restarted gcr/gmres. Third, many of the algorithms are observed to have 
identical convergence rates. For the same convergence rate, there results a trade-off 
between degree and order that can be exploited to optimize efficiency. Fourth, new 
convergence results are proved for both constant coefficient and varying coefficient 
methods. 

The paper's organization follows the steps which led to discovery of the new 
methods. First, iterative algorithms are surveyed from the historical point of view. 
Even those familiar with the subject may find this survey interesting. Then, some 
algorithms are restated in a very simple form. It is suggested this form has peda- 
gogical and practical advantages. Two new generalizations come from it. One is a 
further simplification to a new class of inhomogeneous methods. Their convergence 
cannot be explained by existing analytical tools, nor is it analyzed here. The second 
and more important generalization is to operator coefficient methods. These have 
both homogeneous and inhomogeneous forms. A straightforward implementation 
for one choice of operator coefficients is proposed, and sufficient conditions for con- 
vergence are found. Finally, necessary and sufficient conditions for convergence of 
constant coefficient methods are found. To improve readability, appendices contain 
the proofs of theorems and descriptions of numerical experiments. 

Chronopoulos and Gear [5] [6] [8] have been led by other considerations to 
derive some algorithms in the new class of methods, and their work-in-progress 
examines more [7]. Their work should be consulted for additional insights. 
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2. Survey 

This section surveys the iterative methods to be analyzed in the sequel. The survey 
is mostly historical and phenomenological, with several omissions and some rear- 
rangement. Those who expect to read the paper in one sitting or who are familiar 
with the subject may prefer to begin at Section 3 and to consult this section as 
needed. 

Iterative methods are prescriptions for building sequences Xo x\ Xi . . . x n . . . 
that converge to the solution of Ax = y. If the prescriptions differ but the sequences 
are the same then the methods are the same, so some prescriptions might be better 
than others for the same method. The word prescription is new in this context. 

What appears to be the natural progression of ideas for iterative algorithms 
involves the manner of choosing various coefficients and parameters to build the 
sequences. The oldest iterative methods are incomplete because their parameters 
must be selected from ancillary information. Algorithms in the second phase of 
development make automatic choices that are globally correct in some sense for 
some matrices. Methods in the third phase choose coefficients that are suboptimal 
but serviceable for more matrices. 

The survey is bounded as follows. The first limitation is to polynomial meth- 
ods. Each term of the sequence equals a linear combination of other vectors with 
coefficients that are polynomials of A. The second limitation is to polynomials only 
of A. If prcconditioners are used, then A must be the result of any preconditioning 
matrix multiplications. The third limitation is to amounts of space and time per 
algorithm step limited independent of the step number. This means not all the pre- 
ceding solutions, residuals, or whatever are available to build the next approximate 
solution. 

The following notational conventions are used throughout. The exact solution 
of Ax — y is x*. The error in x n is e n = x* — x n . The residual of x n is r n — 
y — Ax n . Note that Ae n — r n . Symbols with negative subscripts equal zero, and 
matrix-vector norms are the 2-norm, unless stated otherwise. Complex numbers 
are assumed. The Hermitian part of A is (A* + A)/2. The set of eigenvalues of the 
matrix H is X(H). 

2a. Incompletely Specified Methods 

Richardson's 1st Order Method, 1910. Iterate from xq. 

Xn+l — cv n r n -\- x n 

Richardson's paper [43] makes interesting reading from the turn of the century. 
Residual polynomials 

n 

P n (X) = H(l-a 3 X) 

have been used to understand this and other methods because they provide formulas 
for the residuals and the errors. 

r n = P n (A)r e n = P n (A)e 

The many papers such as [41] on the proper choice of coefficients are outside the 
scope of this survey. If all the coefficients ao, Qi, ct2, • ■ • equal the same a, then the 



method converges for all y and xq exactly when all the eigenvalues of A lie strictly 
inside the circle through around 1/cx in the complex plane. 

The history of iterative methods swings like a pendulum between two extremes 
of fashion. On the one side are the basic iterations surveyed here, on the other 
side are things now called preconditioners. Preconditioning replaces Ax = y by 
BAx = By where B is an approximate inverse for A. The pendulum started with 
Richardson's 1st order method, and the first reversal probably occurred when inter- 
est reverted to relaxation schemes, some of which are much older if their appellations 
can be believed. 

The classic preconditioners were developed for use with the simplest 1st order 
method, for a n = 1 and x n+ \ = r n + x n . They split the matrix A = L + D + U 
into its diagonal and triangular parts, and choose B as follows. 

D- 1 Jacobi, 1845 

(D + L)- 1 Gauss-Seidel, 1873 

(— D + L)^ 1 SOR, Successive Overrelaxation, 1950 

2^(!£> + J7)-i£)(lD + L)- 1 SSOR, Symmetric SOR, 1950 



G. E. Forsythe said Gauss-Seidel was not known to Gauss and not recommended 
by Seidel [30] , so the very old references in this and [48] [53] [62] must be consulted 
to see how preconditioners predating Richardson's method, or at least Richardson's 
description of his method, were conceived. These and other relaxation methods 
arc now viewed as a class of preconditioners. SOR was invented independently 
by Frankel [21] and Young [59] [60], and SSOR by Aitken [1]. In the 1950's and 
60's many matrices and preconditioners were found that make Richardson's method 
converge. They are described by Golub [22], Golub and Varga [23] [24], Varga [53], 
Young [61] and many others. 

2nd Order Method, (1950) 1958. Iterate from x 

*£n+l C^n^n i Priori < "fn%n—l 

in which /3 = 1 and f3 n + 7„ = 1 . 

Frankel [21] invented this method in 1950 and named it after Richardson. He 
was deferential to a fault because in the same paper he invented SOR and named 
it after Liebmann. Frankel and many others omit the name of the third coefficient. 
Stiefel [50] makes "f n = 1 — f3 n appear to be a natural consequence of normalization. 
The conditions /3o = 1 and (3 n + j n — 1 enable the following analysis. With them 
the familiar residual polynomials exist 

r n = P„(A)r e n = P n (A)e 

and can be built from 

P (X) = 1 P n +i(X) = -a n XP n {X) + p n P n {X) + y n P n -!(X) 

just like normalized orthogonal polynomials. Stiefel may have been the first to 
describe the method's possibilities when he suggested consulting the theory of or- 
thogonal polynomials to find appropriate coefficients. 



Chebyshev Iteration, (1957) 1975. Iterate from xq 

■^n+l Q*nTn ~r Priori "r \*- Pn)Xn—l 



in which 



a = 1/d A) = 1 

2T n (d/c) _ 2dT n (d/c) 

Pn 



cT n+1 (d/c) " cT n+1 (d/c) 

where T n is the n th Chebyshev polynomial of the first kind. 
The Chebyshev iteration is the 2nd order method with residual polynomials 

,'d-X^ 

P n {X) = — 



(d 

Varga [52] derived the method differently by building rapidly converging sequences 
from more slowly converging ones. Mantcuffel [35] [36] [37] extended the method 
beyond symmetric positive definite matrices and coined the present name. The 
iteration converges for all y and xo exactly when all the eigenvalues of A lie strictly 
inside the ellipse through with foci d ± c in the complex plane. 

Stationary 2nd Order Method, 1982. Iterate from Xq and X-\. 

x n+ \ = ar n + fix n + (1 - ^)x n _ 1 



Iterative methods and linear recurrences are stationary when the coefficients 
are independent of n. The handful of papers on the stationary 2nd order method 
seek coefficients that optimize convergence for a given matrix. The answer to the 
simpler inverse question — which matrices converge for a given pair of coefficients? — 
can be obtained from [38] and a few napkins. Convergence occurs for all y and x$ 
exactly when all the eigenvalues of A lie strictly inside the ellipse through with 
foci 



(L± huEH 



a 



Moreover, the Chebyshev iteration's coefficients for these foci converge to the sta- 
tionary coefficients. 
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2b. Completely Specified, Terminating Methods 

By the time 2nd order methods were completely understood, the pendulum had 
already swung toward iterations with completely specified coefficients. The next 
method is the namesake for the entire class. 

Conjugate Gradient Algorithm, (1952) 1971. Iterate from xo. 

_ _ p n -i*Ar n 

Pn r n A Pn — 1 

Pn-1 APn-1 



Pn >n 

*^n+l — *^n ~r ~~~. Pn 

Pn Ap n 

Hestenes and Stiefel [28] drew this method from optimization theory. If A 
is Hcrmitian and positive definite, then the method searches from x n along the 
direction vector p n for the x n+ \ that minimizes 



||e„+i|U = y/e n+1 *Ae n+1 . 

It happens that the p's are A-orthogonal, the r's are orthogonal, and x n +\ is the 
global minimizcr within 

.To + span{j> Pi Pi ■■■ Pn} = 
x +span{r n r 2 ... r n } = 

.T + span{r Ar Q A 2 r ... A n r } ■ 

When the subspaces stop growing the last iterate is the exact solution. Hestenes 
[29] derives many algebraic identities including the few needed to establish global 
optimality and alternate expressions for the coefficients. Golub and O'Leary [25] 
provide an excellent annotated bibliography for the huge corpus. 

Elaborate formulas were a disadvantage on early computers so for many years 
the conjugate gradient algorithm was seen as a freakish alternative to Gaussian 
elimination [30]. By 1971 technological improvements enabled Reid [42] to view 
the algorithm as an iterative method and to obtain acceptable solutions after com- 
paratively few steps. A curious tribute to Reid is that his paper is no longer read 
because his ideas are so completely accepted. 

The matrices for which the conjugate gradient iteration finds an exact solution 
for all y and xo are the terminating class. This terminology is new. Preconditioning 
by A* yields A* Ax — A*y with A* A in the terminating class but with squared 
condition number. Faber and Mantcuffcl [17] [18] [19] answered a challenge of 
G. H. Golub and found the terminating classes for many polynomial methods based 
on A alone. The Russian literature contains a related announcement at about 
the same time [55]. Unfortunately, all the classes are severely restricted. If the 
Hcrmitian part of A is positive definite, then Joubert and Young [32] show from the 
work of Faber and Mantcuffcl that the conjugate gradient algorithm terminates for 
all y and xo exactly when cither A* = P(A) where P(X) is a polynomial of degree 
at most 1, or P(A) = where P(X) is a nonzero polynomial of degree at most 2. 
The direction vectors' ^4-orthogonality must be reinterpreted in the non-Hermitian 
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Conjugate Residual Algorithm, (1955) 1970. Iterate from Xq. 

p n -i*A*Ar n 



p n _ 1 *A*Ap n _ 1 



, Pn*A 



p n *A*Ap n 

This algorithm has an interesting genealogy. Hestenes and Stiefel allude to it 
[28], but Stiefel describes it fully without naming it [49], and Luenberger finally 
names it when he reinvents it [33] . The algorithm is one of many variations of the 
conjugate gradient algorithm with similar properties. This one uses a different inner 
product. If A is Hcrmitian and positive definite, then the p's are A*j4-orthogonal, 
the r's are A-orthogonal, and x n+ i is the global minimizcr of ||7"„+i|| within 

.x + span{p Pi P2 ■■■ Pn} = 
x +span{r n r 2 ... r n } = 

x + span {r Ar A 2 r ... A n ro} ■ 

Remember the convention that unspecified norms are the 2-norm. Joubcrt and 
Young [32] show from the work of Faber and Manteuffel [17] [18] [19] that among 
matrices whose Hcrmitian part is positive definite, the conjugate residual algorithm 
has the same terminating class as the conjugate gradient algorithm. 

Alternate Conjugate Residual Algorithm, 1951. Iterate from x$ 

. p n - 1 *A*A 2 Pn - 1 p n - 2 *A*A 2 p n - 1 

Pn = A Vn _ X - p n _! - - Pra _ 2 

Pn— 1 ■"■ SiPn-l Pn-1 ■"- s±Pn-2 

_ , Pn A r n 

2-n+l %n i I TZ , Pn 

p n *A*Ap n 
but choose po as in the original conjugate residual algorithm. 

This method has been invented by many, but Forsythe, Hestenes and Rosscr 
[20] appear to be the first [11] [25]. It is the conjugate residual method with a dif- 
ferent prescription whose terminating class is larger. Faber and Manteuffel [18] [19] 
show the method converges for all y and xq exactly when either A* = P(A) where 
P{X) is a polynomial of degree at most 1, or P(A) = where P{X) is a nonzero 
polynomial of degree at most 3. These are weaker conditions than for the conjugate 
residual algorithm because 3 replaces 2, and more importantly, the Hcrmitian part 
of A need not be positive definite. And yet the p's are A*A-orthogonal, and x n+ \ 
is the global minimizcr of ||rn+i|| within 

x +span{po Pi P2 ••• Pn} 

= x + span{r Ar A 2 r ... A n r n } 

just like the conjugate residual algorithm. 

The relative merits of the two prescriptions are not clear. The original fails for 
an indefinite matrix when a direction vector makes no contribution to the solution. 
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In this unlikely event the residuals do not change and the subsequent direction vec- 
tors lie within the span of the previous. The alternate version succeeds because 
it builds the direction vectors from a self-contained recurrence. In some sense the 
alternate prescription is an analytic continuation of the original. But the theoret- 
ically more powerful prescription amounts to evaluating the Lanczos recurrence, 
which may be more sensitive to rounding errors. 

The effects of rounding errors are a major disappointment for all these algo- 
rithms. Loss of orthogonality and failure to terminate are the most easily observed 
symptoms. The conjugate gradient algorithm should have p* Apj = for i =^ j, but 
in practice Pi*j4.Pj/(||Pt|| J 4||Pj|| J 4) grows exponentially with \i — j\. Figure 1 makes 
the more difficult comparison between the numerically computed direction vectors 
and those that would be obtained from error-free arithmetic. This data may be the 
first of its kind in print. The formulas for the direction vectors evidently are unsta- 
ble because they magnify the small perturbations due to rounding error. Figure 2 
shows the resulting delayed convergence. Greenbaum has a detailed analysis of the 
retarded convergence [27], but no universal, inexpensive cure is known. 

Interest in the conjugate gradient algorithm was intense for a time. Many ter- 
minating algorithms were proposed with enlarged terminating classes. Generalized 
refers indiscriminately to these methods for which there is no consistent naming 
convention. The generalized conjugate gradient algorithm [9] [10] [58] established 
the use of very different inner products and to some extent prompted the work of 
Faber and Mantcuffel. Ashby, Manteuffel and Saylor classify many generalizations 
of this kind [3] . 

Some terminating generalizations of the conjugate gradient algorithm have 
mostly theoretically use. The analyses of Faber and Manteuffel and of Joubert 
and Young actually proceed by seeking conditions under which the original algo- 
rithm is equivalent to a generalized one that uses all the direction vectors to build 
the next. The same generalization can be made of the conjugate residual algorithm. 

Generalized Conjugate Residual Algorithm, 1982. Iterate from Xq. 

Pn-i A Ar„ 



J n ' n 



EPn-i -f± -f±l n 
* A * A i 

_ 1 Pn—i A /±p n -i 



_ Pn A " 71 

p n *A*Ap n 



Formulas of this kind may be traced to Arnoldi [2] , but this algorithm is from 
Elman [14] and Eisenstat, Elman and Schultz [13]. Its terminating class includes all 
matrices with positive definite Hermitian parts. The method itself is impractical, 
and is outside the bounds of the present survey, because all the previous direction 
vectors must be saved. It can be made practical either by truncating the sum, or 
by restarting the iteration, as follows. 
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Figure 1. 2-norm relative errors in the computed basis vectors of the conjugate 
gradient algorithm for a system of order 100. Appendix 2 and Section 2b explain 
the calculations. 
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Figure 2. Relative A-norm solution errors for the system of Figure 1. The upper 
curve is for single precision and the lower for reorthogonalized double precision. 
Appendix 2 and Section 2b explain the calculations. 
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2c. Completely Specified, Non- Terminating Methods 
Orthomin(m), 1976. Iterate from xq. 

"L „ .* A* A r 
_ V^ Pn—i -H.Ji.ln 

Pn ~r„- 2_^ - .*A*A n f P "-* 
. . Pn—t ti J±p n - t 

z— 1 



_ , Pn A r n 

2-n+l — X n H J?„ 

Vinsome [54] invented this truncated algorithm and made non-terminating or 
suboptimal convergence respectable again. Elman [14] and Eiscnstat, Elman and 
Schultz [13] prove convergence for all y and Xq whenever the Hcrmitian part of A 
is positive definite. Specifically, they prove 



<n+l 



< 




\X(A* + A)\ 12 



21141: 



< ||rn| 



They also show each set of m + 1 consecutive direction vectors is j4*A-orthogonal, 
and :c n +i minimizes ||r n +i|| within 

Xn-mAO + Span {p n - m ... p n -2 Pn-1 Pn} ■ 

The notation is rigorously correct because things with negative subscripts vanish 
and the wedge in x n - m/ \o means the maximum of n — m and 0. It is an open ques- 
tion why the suboptimal convergence results ignore m and exclude the Hcrmitian 
indefinite case for which the alternate conjugate residual algorithm has terminating 
convergence. 

The truncated algorithm is expected to have dependable convergence for many 
matrices rather than terminating convergence for a few. As m increases the termi- 
nating class grows beyond that of the conjugate residual method, but only through 
the addition of matrices having at most m 2 distinct eigenvalues [18] [19] [32]. Ter- 
mination also occurs, of course, for m so impractically large that the method reverts 
to the generalized conjugate residual method. 

As with the conjugate residual algorithm, there is an alternate version that 
generates the direction vectors independently of the residuals. Saad and Schultz 
survey many equivalent prescriptions [46] . Saad develops some of these algorithms 
himself, and names the class incomplete orthogonalization methods [44]. Jea and 
Young [31] also develop a broad class of methods and also provide extensive refer- 
ences to other work. In their terminology, the original conjugate residual algorithm 
is an orthomin and the alternate is an orthodir, both with m = 1 and with specific 
choices for inner products and the like. Faber and Mantcuffel [17] [18] [19] actually 
treat the alternate prescriptions, but Joubert and Young [32] and Ashby, Mantcuffel 
and Saylor [3] carefully observe the distinction. 

Gcr(k), Restarted Generalized Conjugate Residual Method, 1979. 

Iterate from xq, and for each x n obtain x n +i by building the sequence 

X„=X^ CC (1 ) X(2) ■■■ S(fc+1) = X n+ l 
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by iterating the generalized conjugate residual algorithm from x/q) to 

v^ P(i-i)* A * Ar U) 

»=1 Pf 4 - 1 ) A A P(t-l) 
P{j)*A*T{j) 

X « + »- X M + Pu) *A*A PU) p M 

in which the subscripts in parentheses indicate dependence on n. 

The idea of restarting an algorithm contrasts with truncating in orthomin(?n). 
It is the theory that jiggling the ignition recharges the battery and may be due 
to several people. Luenberger [34] restarts the conjugate gradient algorithm to 
circumvent numerical difficulties, while Eisenstat, Elman, Schultz and Sherman 
[12] [13] [14] restart the generalized conjugate residual method to conserve memory 
space. They show that gcr(fc) minimizes ||r n +i|| within 

x n + span {r n Ar n A 2 r n ... A k r n } 

and converges when the Hcrmitian part of A is positive definite. 

Like truncated orthomin, restarted gcr is expected to have dependable conver- 
gence for many matrices rather than terminating convergence for a few. And again 
there is an alternate prescription. This one additionally makes the direction vectors 
orthogonal with respect to the Euclidean inner product rather than the A* A inner 
product. 

Gmres(k), Restarted Generalized Minimum Residual Algorithm, 
1983. Iterate from xq, and for each x n build the orthonormal sequence 

irrj|=P(i) P(2) P(3) ••• P(k+i) 
from Arnoldi's recurrence equations 

3 

h U+hj)PU+i) = A P{J) -^2 h (i,i)P(i) 

i=l 

with appropriately chosen huj) 's, and then chose au\ 's to minimize 
|| ||r„||ei - [/i (i ,j)] [a (j) ] \\ 

in which e\ is the first column of an identity matrix and [hu t j\] is the 
(k + 1) x k matrix of recurrence coefficients, and finally construct 

k 
X n +1 = X n + 22 a U) P{j)- 

The subscripts in parentheses indicate dependence on n. 

Saad and Schultz [45] [47] developed this most widely used version of restarted 
gcr. It uses an alternate prescription to generate the normalized direction vectors, 
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with the precise choice of ft(»,j)'s clear and mercifully omitted. The coefficients 
for x„ are selected from the small matrix that describes the action of A on the 
orthonormal basis. Nevertheless, gmres(fc) minimizes ||r n +i|| within 

x„+span{r„ Ar n A 2 r n ... A k ~ 1 r n ) 

just like gcr(fc— 1). Both prescriptions sometimes have difficulty solving the least 
squares problem, and other prescriptions have been proposed [56] [57]. 

Note the multiple names. Gcr(fc— 1) is gmrcs(fc). The first generalizes the 
conjugate residual algorithm and the second generalizes the minimum residual al- 
gorithm. So there is yet another line of development which leads to the same 
methods. But this is too broad a subject for discussion. 



3. Simplification 

The completely specified algorithms in the survey can be reduced to simpler but 
equivalent form. Here, all inessential notation is removed to leave what may be the 
vital core. This naive approach leads to useful generalizations in subsequent sec- 
tions, and even to useful implementations. The simplified algorithms are strikingly 
similar. Each chooses its next solution from a small selection space, and uses a 
minimization problem as the selection criterion. 

Another form of the conjugate gradient algorithm discards the direction vectors 
and reveals it to be a 2nd order method. 

*£n+l — Q-nXn < Pn%n t (-^ PnJ^n — 1 

This version has several sources. One builds the coefficients recursively, and is at- 
tributed to Engeli, Ginsberg, Rutishauser and Stiefel [16] by [63], and to Rutishauser 
alone by [42]. Hestenes [29] cites a form called paratan with a geometric interpre- 
tation and explicit coefficient formulas [48] . 

IK-i|| 2 ||r n || 2 



||r„_ 1 ||2(r^r„) + ||r„||2(r;_ 1 ^r„) 

o = \K-i\\ 2 (r* n Ar n ) 

Pn ||r n _i|| 2 (r*^r n ) + ||r n |P(r;_ 1 4r n ) 

These formulas too can be discarded because theorems say they make ||e n +i||.A 
globally minimal, and therefore locally minimal. In this way a simple minimization 
criterion concisely replaces many elaborate formulas. This interpretation succinctly 
characterizes both the conjugate gradient algorithm and its cousin. 

Simplest Conjugate Gradient Algorithm. Iterate from xq. 

minimize ||e n+ i||^ over span {r n x„ x„_i} 
so coefficients of x n and x n -i sum to 1 

Simplest Conjugate Residual Algorithm. Iterate from xq. 

minimize ||r„ + i|| 2 over span {r n x n X n -i} 
so coefficients of x n and x n -i sum to 1 
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The starting point for a simpler version of orthomin(m) 



^n Tn / 



Pn-i A Ar n 



n—i 



Pn-r n 2.^ n .*A*Ar> p 

_ 1 Pn—i ■"■ -riPn—i 



_ , Pn A r n 

X n +1 — x n H . , Pn 

is the work of Elman [14] and Eiscnstat, Elman and Schultz [13]. They show if the 
Hcrmitian part of A is positive definite, then x n+ \ minimizes ||r n +i|| within 

Xn-mAO + Span {p n - m ■■■ Pn-1 Pn-1 Pn} 

and the coefficient in the formula for x n +i can't vanish. The notation is rigorously 
correct because things with negative subscripts vanish and the wedge in x n - m /\o 
means the maximum of n — m and 0. The formula for p n means r n can replace p n 
inside the span. The formula for x n+ \-j means (x n +\-j — x n —j) can replace p n -j- 
With these substitutions the affine space becomes 



Xn-mAO + span 



\Xn-\-l — m X n — m ) . . . [X n — i X n — 2) \X n X n —\) T ri 



from which (x n +\-j — x n -j) vanishes if n — j < 0. The x-coordinates of each vector 
in the span sum to zero, and the affine space adds x„_ mA o, so the ^-coordinates of 
each vector in the affine space sum to 1. 

Simplest Orthomin(m). Iterate from xq 

minimize \\r n+ i\\ over span {r n x„ x n -\ x n - 2 ■■■ x n - m } 
so coefficients of x n x n -\ x n -2 ■ ■ ■ x n - m sum to 1 



This definition of orthomin(m) may be new, but the simplest form of gcr(fc— 1)/ 
gmres(A;) is well known and needs no derivation. Table 1 allows side-by-side com- 
parison of the simplest versions of all these methods for the first time. 

These versions are proposed as archetypes for study and use. Each prescription 
is a simple linear recurrence with coefficients that repetitively solve a simple mini- 
mization problem. Confusion and duplication of effort are unlikely because identity 
and functionality are clear at a glance. Straightforward solution of the minimization 
problems affords easy comparison and substitution of methods. 
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Table 1. Simplest prescriptions for the survey's completely specified al- 
gorithms. Section 3 provides further explanation. 



conjugate gradient 

minimize Hen+iH^ 

over span \r n x n x„_i} 

so coefficients of x n and X n -i sum to 1. 

conjugate residual 

minimize ||r n+ i|| 2 

over span {r n x n x n -i} 

so coefficients of x n and x n -\ sum to 1 

orthomin(?7i,) 

minimize ||r„ + i|| 2 

over span {r n x n x n -\ x„_ 2 . . . x n - m } 

so coefficients of x n x n -\ x„_ 2 ■ ■ ■ 

gcr(fc— l)/gmres(fc) 

minimize ||r„+i|| 2 

over span {i„ r n Ar n A 2 r n ... A k 

so coefficient of x n equals 1 



4. Inhomogeneous Methods 

New methods can be derived by further simplifying the algorithms of Table 1 . The 
resulting algorithms apparently cannot be analyzed by traditional theory nor is a 
new theory offered here. These algorithms do simplify Section 5's presentation of 
more important generalizations. 

A common feature of all the algorithms in Table 1 is the constraint that the 
x-coefficients sum to 1. That is, the next solution equals a linear combination of pre- 
vious solutions and other things, in which the coefficients of the previous solutions 
sum to 1. This constraint is called the consistency condition, but homogeneity con- 
dition more accurately describes its use. With it, the formula for the next solution 
can be multiplied by A and subtracted from y to make an homogeneous recurrence 
for the residuals, and this can be multiplied by A^ 1 to make a similar recurrence for 
the errors. If the recurrence formulas are used to produce polynomials rather than 
vectors, then all the residuals and errors can be obtained formally, by multiplying 
the initial residual and error by these so-called residual polynomials evaluated at 
the matrix A. 

The entire convergence theory of iterative methods rests on residual polyno- 
mials. Convergence to the solution of Ax — y depends on both A and y, but the 
homogeneity condition allows a separation of variables in which the entries of A 
are more prominent than the entries of y. Convergence is equivalent to the resid- 
ual polynomials having small values at the matrix eigenvalues. The incompletely 
specified algorithms in Section 2a need parameters that make the polynomials small 
independent of y. The completely specified algorithms in Sections 2b and 2c choose 
parameters that make the polynomials small in norms weighted by the entries of y. 
In both cases, convergence depends strongly on A and weakly on y. 
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When the algorithms are stated so simply as in Table 1, however, there is clearly 
no reason to impose homogeneity. It is a theoretical convenience for convergence 
analysis that is superfluous to the algorithms. Completely new algorithms can be 
derived by removing the constraint. Table 2 presents these even simpler, inhomo- 
geneous algorithms. Henceforth, the original algorithms are called homogeneous. 



Table 2. Inhomogeneous, simplest prescriptions for the survey's com- 
pletely specified methods. Section 4 provides further explanation. 



un-conjugate gradient 

minimize ||e n+ i||^ 
over span {r n x n x„_i} 



un-conjugate residual 

minimize ||r n+ i|| 2 
over span {r n x n X n -i} 



un-orthomin(m) 

minimize ||r n+ i|| 2 

over span {r n x n x n -\ x n -2 ■ ■ ■ x n - m } 



un-gcr(/c— l)/gmres(fc) 



minimize ||r n+ i|| 2 

over span {x n r n Ar n A 2 r n 



ifc-i„ 



Figure 3 shows that the new, inhomogeneous methods may converge when the 
old, homogeneous methods do not. The selection criteria evidently find smaller 
minima when the selection spaces grow by removing the homogeneity constraint. 
This explanation is too simple, however, because it does not characterize the new 
convergence rate. This difficult question is not addressed here beyond the following 
comments. First, terminating algorithms already make globally optimal choices, 
so removing the constraint should not change them, at least in exact arithmetic. 
Second, if the recurrence coefficients of non-terminating algorithms converge to 
constant values, then the recurrence formulas must be homogeneous in the limit. 
In particular, algorithms with constant coefficients must be homogeneous. 
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Figure 3. 2-norm relative residuals for homogeneous (dashed) and inhomogeneous 
(solid) gcr(k—\)/gmres(k), k = 1, 2, ..., 10, applied to one system. The two 
methods perform alike except for k = 5 when the original, homogeneous method 
stagnates and the new, inhomogeneous method converges. Appendix 2 and Section 4 
explain the calculations. 
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5. Operator Coefficient Methods 

This paper's major observation is that new iterative methods can be derived by 
combining the algorithms of Table 2. The easiest way to join the algorithms is to 
amass their selection spaces, as follows. 

The basis vectors naturally fit into a tableau. Those of the conjugate gradient 
algorithm and the conjugate residual algorithm occupy a corner, those of ortho- 
min(m) add a row, and those of gcr(fc— l)/gmrcs(fc) fill a column. 

X n X n —\ X n — 2 ■ • • %n—m 

r n 
Ar n 



A 



fc-i. 



Orthomin(m) apparently gains its advantage over the conjugate residual algorithm 
by keeping more old solutions. It is likely the vectors of gcr(fc— l)/gmres(/c) can be 
kept with some advantage too. The tableau does have room for many more. 



Xn X n —\ X n — 2 

r n r n _i r„_ 2 

Ar n Ar n -i Ar n _ 2 

A k - X r n A k - 1 r r ^ 1 A k - X r n _ 2 





X n - 


-m 




r n - 


-m 




Ar n . 


- m 


... A k 


' n- 


-m 



The remainder of this paper demonstrates that better iterative methods can be cre- 
ated by placing some or all of these vectors into the selection spaces. The following 
definition of the new methods involves a change of notation because the tableau 
loses one column. 

Oc(k, m), Operator Coefficient Methods of Degree k and Order 

m. Begin from xq and optionally from X-i, x- 2 , ■ ■ ■ , x\- m , and choose 
x n to 



from among 



minimize Wn\\ wna ^ ever or whatever 



span < 



$n— 1 




X n -2 




•En — rn 


r n -i 




T n -1 




Tn — rn 


Ar n -i 




Ar n -2 




-t^Tn — m y 


A k - X r n ^ 

< 


A k 


~ X r n -2 


. A k 


-i r 

1 n — rn 

> 



and, in the homogeneous case, choose x n so the x-coefheients sum to 1 . 

The definition above introduces a name for a generic class of old and new 
algorithms. Three aspects need further explanation. First, it isn't necessary to 
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employ all the vectors in the span. Some old algorithms do not. Second, the 
selection criteria is unspecified because there are so many possibilities. Some are 
explored in later sections. Third, since the recurrence formula has order to, it is 
possible to begin from to initial guesses, xo, X—i, x_2, . . . , x\- m . In this case the 
operator coefficient method is not a Krylov space method. 

Operator coefficient methods include many known iterative algorithms. For 
example, gcr(fc— l)/gmres(fc) is an homogeneous oc (k, I) method minimizing the 2- 
norm of the residual. Orthomin(m) is an homogeneous oc (1, to+1) method that also 
minimizes the 2-norm of the residual and has only the latest residual in the selection 
space. The conjugate gradient and conjugate residual algorithms are homogeneous 
oc (1, 2) methods that minimize various norms and also have only the latest residual 
in the selection space. It would be interesting to find a polynomial-based iterative 
algorithm that is not an operator coefficient method. 

Methods that employ the entire (k + 1) x to tableau may greatly reduce the 
matrix- vector multiplications needed to solve equations to prescribed accuracy. The 
convergence rate of truncated orthomin generally improves as the order, m, in- 
creases. Orthomin is a 1st degree method, k = 1, and similar behavior may be ex- 
pected for higher degree methods, k > 1. Figure 4 shows convergence significantly 
improves by increasing to and fixing k. In this case the matrix- vector multiplica- 
tions for each step are independent of m and are identical to those of gcr(fc— 1)/ 
gmres(fc). Thus, convergence quickens by solving larger minimization problems but 
by performing the same matrix-vector multiplications per step. Faster convergence 
means fewer steps, and fewer matrix-vector multiplications overall. This subject is 
discussed again in Section 7. 

Reducing matrix-vector multiplications is a significant achievement because 
they can account for most of the computational work. When the matrix is randomly 
sparse, then matrix-vector multiplications perform random memory accesses which 
are comparatively slow. If the matrix is not explicitly known, as in matrix-free 
solution of ordinary differential equations [4], then matrix- vector multiplications 
require numerical differentiation of functions whose evaluation may be very slow. 

Several recurrence formulas are associated with an operator coefficient method. 
The selection criterion 

minimize 1 1 r n|| whatever or whatever 
chooses a coefficient tableau 
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Figure 4. 2-norm relative residuals for homogeneous (dashed) gcr(5)/gmres(Q) 
and inhomogeneous (solid) oc(6,m), m = l 2 ... 10, applied to the same system. 
Appendix 2 and Section 5 explain the calculations. 
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Parenthetical subscripts in this and other formulas indicate dependence on n. The 
next residual can be obtained by a similar formula 

m km 

r n = Yl C (0,j)r n -j ~ IZX! C (i,3) Atr n-j + fn 
j=l i=l j=\ 



f n = y-Y, c (oj)y 

which can be written as a recurrence formula of order m 

r n = P(t)(A)r n -i + P( 2 ){A)r n -2 H h P( m )(A)r„_ m + /„ 

whose coefficients are operators, that is, are polynomials of degree k 

P(j)(X) = c(p,j) - C(i, 3 )X - c { 2, 3 )X 2 c ( k,j)X k 

evaluated at the matrix A. Whence the name, operator coefficient method of degree 
k and order m. If C( .i) + c (o.2) + • • • + C(o, m ) = 1, then the /„'s vanish and the 
residuals satisfy homogeneous recurrence formulas. Whence homogeneous and in- 
homogeneous methods. In the homogeneous case, the residuals can be expressed 
succinctly in terms of the initial residuals 

r„ = Pn,i(A)r + P n>2 (A)r_i + P„, 3 (A)r_ 2 + • • • + P n , m (A)ri- m 
by means of residual polynomials, P n j(X), generated from the recurrence formulas 

Pn.j = P(l)Pn-l,j + P(2)Pn— 2,j + ' ' ' + "( m )P n _ TO) j 

with initial values P\-j t j = 1 and others zero. This representation for r n is numer- 
ically correct, however, only if the recurrence formulas are stable when X = A. 
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6. Implementations 

The implementation of inhomogencous operator coefficient methods that minimize 
the 2-norm of the residual is considered here. This section has two parts. The 
first analyzes implementations of older methods, the second describes a reasonable 
implementation for all oc (fc, m) methods. Those who are interested in the new 
methods may prefer to read the notation below and to begin at Section 6b. 

Inhomogeneous oc (k, to) methods that minimize the 2-norm of the residual 
perform the following task at step n. They choose x n — V n c n , where c„ solves the 
least squares problem 

min \\y - AV n c n \\ 2 , 

and where V n 's columns are a basis for the selection space. The natural basis for 
the full-tableau method is the following. 



span {V n } — span < 





%n — 1 




x n-2 




•^n — rri 




r„_i 




r n -2 




'n—m 


A k- 


_1 r-„_i 


A k- 


^r.n-2 


■ A k - 


-i r 

'7i — m 

> 



Any implementation makes three choices. The first is the basis for the selection 
space. This basis becomes the columns of V n . The second choice is the basis for the 
least squares problem. This might be the columns of AV n . The third choice is the 
process to solve the least squares problem. All the choices affect both numerical 
accuracy and computational efficiency. The bases might overlap to conserve storage, 
for example, or they might facilitate the solution process to conserve time. The 
chief numerical considerations are the accuracy of the bases and the accuracy of 
the least squares solution. A comparative analysis of all the possibilities is beyond 
the scope of this paper. Ashby, Manteuffel and Saylor [3], Saad and Schultz [46], 
Walker [56] [57] and references therein should be consulted for more implementation 
ideas. 

6a. Some Existing Implementations 

This section analyzes implementations of known methods. It reverses Section 3's 
simplification process, and rebuilds the algorithms with explicit justification for 
each implementation detail. 

Like more general operator coefficient methods, gcr(fc— 1) and gmres(/c) solve a 
least squares problem, but their selection space is smaller and affinc. They choose 
x n = x n -i + V n c n , where c n solves 

min||r„_i - AV n c n \\ 2 , 



and where 



span {V n } — span < 



r„_i 



Ar 



A*" 1 ^-! 
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The gcr implementation of gcr (fc—1) /gmres (fc) maintains two separate bases. It 
uses AV n for the least squares basis, and it chooses V n to be A*A-orthogonal. Evalu- 
ation of the inner products during the orthogonalization process requires either that 
the columns of AV n be saved, or that additional matrix-vector multiplications be 
performed. Since AV n is Euclidean-orthogonal, the normal equations are diagonal 
and are easily solved. However, normal equations may solve least squares problems 
with accuracy less than best. 

The gmres version of gcr(fc— l)/gmres(fc) may be the most efficient for this 
method. It chooses an Euclidcan-orthonormal basis for span {V n } that becomes an 
orthonormal basis for span {l^i}+ span {ylV^} by the inclusion of one more vector. 
That is, AV n = W n H n where W n has the columns of V n plus one, and where the 
small matrix H n is constructed along with the basis. H n represents A under an 
orthonormal change of basis. If the change of basis can be computed accurately, 
then H n can be used to solve the least squares problem accurately. 

It is not clear whether so efficient an implementation is possible for more general 
operator coefficient methods. The oc (fc, 1) methods have the advantage of simplicity 
because their selection spaces involve a single group of nested Krylov spaces. 

span {r„_i} = K C Ki C ■ ■ ■ <Z K k _\ = span {r„_i Ar„_i . . . ,4 fe_1 r„_i} 

More general oc (fc, to) methods have several Krylov spaces and so may not attain 
the efficiencies of the 1st order, to = 1, methods. 

The usual practice with oc (fc, 1) methods is to recursively build orthogonal 
bases for the nested Krylov spaces, 

K j+ i = span {p p x ... p 3 Pj+i} = span {Kj Pj+i} , 

in which Pj+i is orthogonal to Kj. Restarted gcr employs A* A orthogonality, while 
restarted gmres chooses Euclidean orthogonality. Orthogonality is desired for two 
reasons. It is generally believed orthogonal bases provide better numerical repre- 
sentations for their spans, moreover, orthogonality can help solve the least squares 
problem. 

The experience with orthomin(TO— 1), an homogeneous oc (1,to) method, sug- 
gests orthogonality has a third use. It may provide efficient implementations of high 
order, to > 1, methods. Like gcr/gmres, orthomin(TO— 1) chooses x n = x n -\ + V n c n 
where c„ solves 



AV n c n \\ 2 , 



but in this case 



span {V n } = span < »"n-l {x n -i — Xn-2) ■ ■ ■ {x n -(m-l) — %n-m) 

Orthomin(ro— 1) represents this selection space by an inventory of A* A-orthogonal 
basis vectors, 

V n = [p n -1 Pn-2 ■■■ Pn-m] ■ 

Each step maintains the basis by discarding the oldest vector and inserting the 
residual's component orthogonal to the others. As with gcr, yl'M-orthogonality 
results in diagonal normal equations for the least squares problem. The solution 
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update involves only the newest basis vector because previous steps account for the 
others. In this way, each basis vector spans exactly the difference between a pair of 
successive approximate solutions. 

Some algorithms of Chronopoulos and Gear [5] [6] [7] [8] appear to be more 
general oc (k, m) methods that follow the approach taken by truncated orthomin. 
They build the natural basis for a Krylov space of low dimension, say k, and then 
perform an orthogonalization step to enforce A*A-orthogonality among a number 
of such spaces, say m. Like orthomin, the difference between a pair of successive 
approximate solutions lies in a space of low dimension, in this case k, but in the 
absence of arithmetic error the new approximate solution is the best in a larger 
space. An analysis like the one in Section 3 for orthomin would be needed to 
identify the selection spaces in terms of the natural oc (k, m) basis. 

All implementations that rely on recursively produced, orthogonal bases can 
be expected to share the failing of the original conjugate gradient algorithm. The 
vectors are not orthogonal in practice and, as shown by Figure 1 for the conjugate 
gradient method, they can be quite different from the intended vectors. The loss of 
orthogonality in the basis is readily detected, and obviously affects the accuracy of 
the least squares solutions. The loss of accuracy in the basis vectors is difficult to 
detect, but surely affects the essential character of the approximations. Elaborate 
means such as reorthogonalization can remedy the orthogonality, but aside from 
producing more nearly orthogonal vectors, they have not been proved to result 
in better approximations to the underlying Krylov spaces. It is an open question 
whether the approximations can be made consistently better. The natural bases 
have been observed to be badly conditioned [7] [56], so linear transformations that 
make them better conditioned evidently must be ill-conditioned too, and thus must 
be difficult to apply accurately. 

The least squares problem can be difficult to solve however it is formulated. 
The gcr/orthomin approach may be flawed because it solves the normal equations, 
but other methods must contend with near singularity of the least squares bases. 
The matrix H n of the gmres approach has a 2-norm condition number no worse 
than A's, but by being smaller it may reflect ill-conditioning more. Alternatively, 
the natural bases of Krylov spaces can be very nearly singular. 

It is difficult to concede that any but the best solution method should be 
applied to the least squares problem. For least squares problems in general, "the 
only fully reliable way to treat rank deficiency is to compute the singular value 
decomposition" [26 , p. 170]. Apparently no iterative methods heed this advice. Yet 
the singular value decomposition is fairly inexpensive for the small matrices that 
appear in restarted gmres, for example, and may remove some of the difficulties 
occasionally reported for this method [56] [57]. 

In summary, existing implementations always employ orthogonal bases to re- 
duce storage and computation. The savings in storage appear to be at most a factor 
of two, as for gmres versus gcr. This improvement is marginal on present-day com- 
puters and should not govern the choice of implementations. The savings in time 
may be more significant. 

The advantages of orthogonal bases must be weighed against numerical con- 
cerns. The ^4*A-orthogonal bases impose inferior least squares solution methods. 
In the presence of rounding error moreover, it is known that recursively generated 
bases may not accurately span the intended spaces. The effects of this on nonter- 
minating, iterative algorithms are largely unexplored. 
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6b. An Implementation 

The following implementation is generic to all oc (k, m) methods. It solves the least 
squares problems by the singular value decomposition, the best available method, 
and uses the natural bases for the Krylov spaces. This implementation is offered 
both as a research tool and as a model of programming simplicity. It has several ad- 
vantages. First, the implementation allows easy substitution of methods, including 
restarted gcr/gmres and truncated orthomin. Second, it addresses some numerical 
difficulties likely to trouble both old and new methods. Third, it conveniently relies 
on well-known numerical procedures found in many scientific computing libraries. 
This implementation, applicable to all oc (k, m) methods and devoid of program- 
ming complications, may be the most appropriate in the present, early stages of 
development. 

With the natural basis, all algorithms have the same implementation but for 
the choice of basis vectors. Methods such as conjugate residual and orthomin that 
don't use the full tableau can be implemented by simply choosing a subset of the 
larger basis. The columns of AV n for the specific V n of interest must be constructed 
explicitly. Most can be borrowed from previous steps. Only the vectors Ar„_i, 
A 2 r n -i, . . . , j4 fe r n _i associated with the most recent solution are new. They require 
k matrix- vector products. The vector Ax n -i, which also forms r„_i, requires one 
more matrix- vector product or can be obtained recursively. 

The least squares solution process is numerically robust. The singular value de- 
composition solves the least squares problem more accurately, though perhaps more 
expensively, than orthomin-like implementations would solve the normal equations. 
Errors can enter the least squares basis only through the matrix- vector multiplica- 
tions which produce AV n from V n . 

Operator coefficient methods should alleviate the concern that the natural bases 
for Krylov spaces are too nearly singular. High order operator coefficient methods 
make high degree Krylov spaces unnecessary. If very high degrees are needed, then 
Euclidean-orthogonal bases may be computed in the manner of Arnoldi, and may 
be integrated into the computations. 

The following steps compute the minimum norm solution of the least squares 
problem. They are based on recommendations in the text of Golub and Van Loan 
[26 ]. First, the columns of AV n should be scaled to have unit 2-norms. This 
makes K2(AV n ) nearly minimal and improves numerical accuracy. Scaling also 
avoids numerical overflow and underflow when a matrix repeatedly multiplies a 
vector. Second, Householder transformations should reduce AV n to an upper tri- 
angular matrix. This reduces the arithmetic costs when, as here, there are many 
more rows than columns. Third, the singular value decomposition of the small, 
upper triangular matrix must be computed. Fourth, the minimum norm solution 
of the column-scaled least squares problem can be approximated by applying an 
approximate pseudoinversc obtained by ignoring small singular values. Singular 
values smaller than machine round-off relative to the largest singular value might 
be discarded, or more sophisticated methods might be used to determine numerical 
rank. Finally, the unsealed minimum norm solution, c„, combines the columns of 
V n to produce the next approximate solution for the oc (k,m) method, x„ = V n c n . 
Table 3 restates these steps and counts their arithmetic operations. 

With t basis vectors, t < (k + l)m, the implementation needs the following re- 
sources per step of the inhomogeneous oc (k, m) method. There are k matrix- vector 
multiplications, and one more if residuals are not recursively computed. Memory 
space of 2t vectors is needed to store V n and the Householder transformations that 
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Table 3. An implementation of oc (k, m) methods minimizing the 2-norm 
of the residual for a selection space basis of size t < (k+l)m, with operation 
counts. Terms independent of the matrix order N arc omitted. Section 6b 
provides further explanation. 

step operations 



scale AV n to unit column norm, 3tJV 

V := AVnD- 1 

Householder transformations reduce 2t 2 N 

V to triangular form, R := QV 
orthogonal projection of y, z := Qy AtN 
singular value decomposition of R 

solves min \\RDc n — z\\2 
assemble next solution, x n :— V n c n (2t — 1)N 

optionally assemble Ax n and r n 2tN 



reduce AV n to triangular form. Very compact memory management schemes are 
possible since the basis vectors pass from one step to the next but the Householder 
transformations do not. Table 3 counts (2i 2 + 9t — 1)N arithmetic operations, from 
which terms independent of the matrix order, N, have been discarded. The residual 
calculation requires either N or 2tN operations. 

This implementation repeatedly solves a large, dense, overdetcrmincd, singular, 
least squares problem. This task is basic to numerical linear algebra. The House- 
holder reduction and the singular value decomposition already appear in many 
computing libraries, and solution methods tuned to specialized computer archi- 
tectures are being developed. The implementation therefore improves, in a sense 
automatically, with advances to numerical software and hardware. 



7. Varying Coefficients 

Many operator coefficient methods dynamically select recurrence coefficients by 
minimizing the 2-norm of the residual. This selection criterion is examined here. 
Theorem 1 and its Corollary prove convergence for a large class of matrices dis- 
tinguished by a simple polynomial relationship. Moreover, experimental results 
indicate there is a trade-off between degree and order. This may allow high order 
methods to replace comparatively less economical high degree methods. 

Theorem 1. If the Hermitian part H of P(A) is positive or negative 
definite for some polynomial P with degree at most k and P(0) = 0, then 
for every x n the afhne space 

x n + span {r n Ar n A 2 r n ... A k ~ 1 r n ) 

contains a vector x n +i with \\r n +i\\2 < p ||?"n||2 where 



HA(g)| 



< 1. 
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The affme space also contains a vector x n +i, usually different from the 
first, with ||e n +i||2 < p ||e n ||2 (proof appears in Appendix 1). 

Corollary to Theorem 1. If the Hcrmitian part H of P(A) is positive 
or negative definite for some polynomial P with degree at most k and 
-P(O) = 0, then oc (k, m) methods whose selection spaces contain the afEnc 
space 



span {r„ Ar n A 2 r n 



A 



fc-i„ 



.} 



converge for the selection criteria that minimize the 2-norm of either the 
residual or the error. At each step the norm declines by at least the factor 



A- 



min \X(H)\ 
\\P(A)h 



< 1. 



The thesis of Elman [14] is the inspiration for Theorem 1. The Corollary applies 
to gcr(fc— l)/gmres(fc) as well as to more general methods. However, only the case 
in which A itself is positive or negative definite appears to have been published pre- 
viously, by Eisenstat, Elman and Schultz [13]. Saad and Schultz mention this case 
too, and present more detailed convergence results for diagonalizable matrices [47]. 

Theorem l's bound on the convergence rate may be weak because it is inde- 
pendent of the recurrence order m. The Theorem minimally assumes each residual 
r n equals a linear combination that includes vectors from 



span {r„_i Ar n -\ A 2 r 



n-l 



A fc r„-i} 



but the combination also may employ vectors from the larger space 



span < 



r n -i 


r n -2 


Tn—m 


Ar n -i 


Ar n - 2 


■™->n—m 


< 


A k r n - 2 ■ 


■™- T*n — m 

> 



Thus, r n depends on powers of A up to A kr< 
may vary with the product km. 



This suggests the convergence rate 



Figure 5 provides numerical evidence for this interpretation. The Figure ex- 
hibits level curves of observed convergence rates as functions of k and m for the 
convergence histories shown in Figure 6. The level curves have the expected quali- 
tative behavior. In this example, the oc (6, 1) and oc (3, 5) methods have essentially 
the same convergence rate. This means they achieve the same accuracy in the same 
number of steps, but the oc (3, 5) method requires half the matrix-vector multipli- 
cations. 

This apparent trade-off between k and m may be the most important aspect of 
oc (k, m) methods. It allows the beneficial effects of larger k in gcr(fc— l)/gmres(fc) 
to be realized more economically with larger m. The most efficient choice of k 
and m can be expected to change with the computer and the problem, and with 
the expense of solving each step's minimization problem relative to the expense of 
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Figure 5. Level curves of observed convergence rate for inhomogeneous oc(k,m) 
as a function ofk and m for one system. The curves range in multiplicative steps of 
10 ' 05 from 10" 1 at the upper right to 10°, no convergence, at the lower left. The 
left edge corresponds to gcr(k-l) /gmres(k) , the bottom edge to orthomin(m—l) . 
Figure 6 displays the convergence histories. Appendix 2 and Section 7 explain the 
calculations. 
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Figure 6. Convergence histories from which the level curves of Figure 5 are derived. 
Appendix 2 and Section 7 explain the calculations. 
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performing matrix-vector multiplications. If the convergence rate p did vary only 
with km, for example if 

p{k,m) — p{k/£, £m), 

then low degree, high order methods would be more economical. Minimizing the 
2-norm of the residual involves solving a least squares problem with a basis of size 
(fc+ l)m whose computation and memory requirements are roughly constant among 
oc (fc, m) methods with the same km. But oc (A:, m) performs k or k + 1 matrix- 
vector multiplications per step, and this part of the total cost decreases with k. 
Considerations of this kind can be expected for all manner of coefficient choices. 

The frequently used minimization criteria of the kind in Theorem 1 are a power- 
ful but imprecise tool for choosing coefficients. They may not find coefficients that 
produce convergence, and even when they do, they may not produce the fastest 
convergence. Moreover, the Theorem's bound may be a poor estimate for the con- 
vergence rate because the selection criteria minimize norms of vectors, but the 
bound employs norms of matrices. When these matrix-vector norms are applied to 
matrices, they depend on both the eigenvalues and eigenvectors, while Theorem 2 
in the next section shows convergence can depend on the eigenvalues alone. 

The following example illustrates these concerns. The matrix 

a [3 
a j3 



has eigenvalue a and its Hermitian part has eigenvalues between Real (a) ± |/3|. 
Among oc (1, m) methods, a stationary Richardson's 1st order method can be made 
to converge independent of [3, yet f3 can be chosen so the Hermitian part is indefinite 
and Theorem l's bound is ineffective. 



8. Constant Coefficients 

This final section determines exactly when operator coefficient methods with con- 
stant coefficients converge. This information has several uses. First, it may guide 
the choice of k and m needed to achieve convergence with coefficients selected by 
any means. Second, it suggests ways to select coefficients other than by the usual 
minimization criteria of Section 7. Finally, it proves that some operator coefficient 
methods are new by showing they converge when previously known methods do not. 

The Chebyshev iteration and the stationary 2nd order method are oc(l,2) 
methods that converge only for matrices with eigenvalues inside ellipses that ex- 
clude 0. It is demonstrated below that some constant coefficient oc(l,2) methods 
converge for non-elliptical eigenvalue distributions. These, then, are new methods. 

Only the homogeneous case is possible for constant coefficients. As remarked in 
Section 4, with the approximate solutions converging to x*, and with the residuals 
converging to 0, the sums of x-coefficients on both sides of the recurrence equation 
must balance. 
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Theorem 2. A constant coefficient, homogeneous, operator coefficient 
method of degree k and order m 

m km 



with coefficient tableau 



CQ,l Co,2 • • • Co,., 
Cl.l ci 2 • • • ci . 



Cfcl Cfc 2 • • • Cfc. r . 



converges to a solution of Ax = y for all y and all initial vectors xq, 
X-i, . . . , x\- m exactly when, for each eigenvalue X of A, the maximum 
magnitude r(A) of the roots X of the polynomial 

P(X,X) = X m - P 1 (X)X rn - 1 - p 2 {\)x m - 2 p m {\)x m - m 

with coefficients given by the columns of the tableau 

Pj(X) = coj - cijX -C2,jX - ■ ■ ■ - c/cjA 

is strictly less than 1. Moreover, there is a bound upon the residuals for 
all y and all initial vectors xq, X-i, . . . , x\-. m 

lkr.ll < ( Ikoll + lk-i|| + • • • + llr-i_Tr.ll ) Q(n) R n , 
and if A is nonsingular there is an identical bound upon the errors 

\K\\ < ( ||eo|| + ||e_i|| + • • • + ||ei_ m || ) Q(n) R n . 

R is the maximum r(A) for all the eigenvalues of A. Q(n) is a polynomial 
that depends on the norm, on A, and on the coefficient tableau. The norm 
may be any consistent matrix-vector norm (proof appears in Appendix 1). 

There is some evidence that constant coefficients may work well in the long run. 
The Chebyshev iteration's coefficients converge to values for which the stationary 
2nd order method converges identically [38]. To the extent coefficients chosen by 
some means do become stationary, Theorem 2 explains the minimal k and m nec- 
essary before the coefficient selection criteria can make oc (k, m) methods converge. 
An entirely constant coefficient iteration might be useful when many systems of 
equations feature the same matrix. The trick is to find the coefficients, and Theo- 
rem 2 is the first step in this direction. 

The following example suggests how constant coefficients might be found, and 
clarifies the statement of Theorem 2. Figure 7 shows that coefficients chosen to 
minimize the residual's 2-norm can be nearly constant over several iterations. When 
dynamically chosen coefficients remain fixed for a time, then a constant coefficient 
iteration with these fixed values may converge. To see if this is the case here, 
Figure 8 superimposes the matrix eigenvalues, as black dots, over some level curves 
of the Theorem's eigenvalue-specific convergence rate 

r(A) = maximum \X\ of all X for which P(X,X) = 0, 
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Figure 7. Coefficients of inhomogeneous oc (2, 2) minimizing the 2-norm of the 
residual for one system. Appendix 2 and Section 8 explain the calculations. 
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Figure 8. Eigenvalues of the matrix (solid dots) superimposed on some level curves 
of the convergence rate r(A) for the constant coefficient regime of Figure 7. The 
levels start at 1 on the boundary and decrease in steps of 0.05. Appendix 2 and 
Section 8 explain the calculations. 
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where P(A, X) is the Theorem's polynomial 

rn 

P(X,X) = X m -y]co tj X m -i + > ]} ]c itj X l X r 



rn 


^ m " 


A; rn 


J=l 




i=\ j=l 



and where the polynomial's coefficients Ci. j are taken from the nearly constant 
regime of Figure 7. The convergence domain is the set of A for which r(A) < 1. 
Figure 8 shows all the eigenvalues lie within the convergence domain, so Theorem 2 
guarantees convergence with these constant coefficients. Figure 9 exhibits the rela- 
tive residuals for the original right hand side and one other in a constant coefficient 
iteration. 

A phenomenon discovered by Trefethen [51] explains why Figure 9 exhibits 
slower convergence than Figure 8 predicts. When the matrix eigenvalues are sen- 
sitive to perturbation, then convergence depends on an envelope of approximating 
eigenvalues introduced by rounding error. Theorem 2 must be applied to these ap- 
proximations to predict the convergence rate. Nevertheless, convergence is assured 
in Figure 9 because a convergent iteration that already accounts for the envelope 
suggests the constant coefficients. 

For a given matrix even with known eigenvalues or approximations thereto, 
it can be difficult to find any convergent coefficients let alone optimal ones that 
minimize Theorem 2's convergence rate, R = max r(A). The inverse problem of 
finding eigenvalue domains convergent for given coefficients is at least numerically 
straightforward. It amounts to seeking the A for which all the roots X of the 
Theorem's polynomial P(X,X) have magnitude less than 1. 

Figure 10 displays the convergence domains for some arbitrary coefficient tab- 
leaux as large as 3 x 2, that is, for methods up to oc (2, 2). 

Co,i c 0i2 = 1 - c 0]1 

Cl,l Ci, 2 
C2,l C 2: 2 

Table 4 lists the specific coefficients. 



Table 4. Constant coefficient tableaux for the convergence domains pic- 
tured in Figure 10. Section 8 provides further explanation. 
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Figure 9. 2-norm relative residuals for the iteration of Figure 7 (lower solid line), 
and for a constant coefficient iteration with the same right hand side (higher solid 
line) and a different right hand side (dashed line). Appendix 2 and Section 8 explain 
the calculations. 
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Figure 10. Level curves of convergence rate r(X) as a function of X in the complex 
plane for Table 4's constant coefficients. The levels begin at 1 on the boundaries 
and decrease in steps of 0.05. Appendix 2 and Section 8 explain the calculations. 
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As explained in the survey of Section 2, Richardson's 1st order method, oc (1, 1), 
has a circular convergence domain. Figure 10a shows the typically elliptic domain 
of the stationary 2nd order method, oc (1, 2). 

Figures 10b and 10c prove that some operator coefficient methods are new. 
Oc(l,2) methods require just one matrix-vector multiplication per step, and yet 
fully populated 2x2 tableaux have non-elliptic convergence domains because the 
relationship that P(X,X) = creates between A and X, 

X 2 - co,iX - c , 2 

C 1}1 X +Ci )2 

generally does not map circles in X to ellipses in A when ci,2 ^ 0. The domain 
in Figure 10b closely abuts the imaginary axis, a difficult feat for the Chebyshev 
iteration's ellipses. The crescent-shaped domain in Figure 10c actually crosses the 
axis, an impossible feat for the Chebyshev iteration's ellipses. 

The remaining plots in Figure 10 illustrate the possibilities of tailoring the 
domain. Figure lOd is for a stationary gcr(l)/gmres(2) method, oc (2, 1). Figure lOe 
shows introducing just the previous iterate, oc (2, 2), produces a square convergence 
region. Figure lOe is for a fully populated, 3x2 tableau. 
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Appendix 1. Proofs 

This appendix proves the theorems cited in the text. 

Theorem 1. If the Hcrmitian part H of P(A) is positive or negative 
definite for some polynomial P with degree at most k and -P(O) = 0, then 
for every x n the afEne space 



x„ + span {r n Ar n A 2 



A 



fc-i„ 



J 



contains a vector x n+ \ with ||r„+i||2 < p|| r ™||2 where 



p=\n- 



\KH)\ 



\P{A)\ 



< 1. 



The affme space also contains a vector X n +\, usually different from the 
first, with ||e n+1 || 2 < p||e„|| 2 - 

Proof. This proof generalizes the one Elman [14] and Eisenstat, Elman and 
Schultz [13] use to prove convergence of truncated orthomin. Its application in this 
context, and the part about minimizing the error, appear to be new. 

If r„ =0 then choose x n+ i — x n . Otherwise let P{X) = ciX+c 2 X 2 -\ Yc k X k 

and choose 

x n +i =x n + «(ci + c 2 A H h c k A k ~ l )r n 

for some real a. This expression is special because r n+ \ = [I — aP{A)\ r n so 

lkn+i|| 2 -||r„|| 2 -2 a r„*i7r„+ a 2 ||P(A)r„|| 2 . 

Now, \\P(A)r n \\ ^ because P(A)r n ^ because r n *P(A)r n = r n *Hr n ^ so the 
formula is quadratic in a. The minimum occurs at a — r n * Hr n /\\P(A)r n \\ 2 and is 



K+ill 2 = ||r„|| 2 



T 2 



< r n z - 



min|A(F)| 

\\P(A)\\ 



-i 2 



\P(A)r n 

There is no need to ponder the size of 

min|A(H)| 
\\P(A)\\ ■ 

It must be < 1 because [|r„ + i|| 2 isn't negative. And it must be > because it is an 
absolute value over a norm. But it can't be zero because H is positive or negative 
definite so the numerator won't vanish. 

As for minimizing the error, the choice 

x n+ i = x„ + a(ci + c 2 A H h c k A k ~ x )r n 

also yields e n+ \ = [I — aP(A)\ e n . This vector's norm can be minimized in the same 
manner as the residual's, but the minimum most likely occurs for a different a. How 
different? I 
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Theorem 2. A constant coefficient, homogeneous, operator coefficient 
method of degree k and order m 



X. n — 2_^ c 0,j x n-j + 7 J 7 J c i,jA 



k m 

ra-j 



with coefficient tableau 



c 0,l Co,2 • • • Co, 
Cl.l Ci,2 • • • Ci ,, 



Cfc,l Cfe o • • • Cfe . 



converges to a solution of Ax = y for all y and all initial vectors xq, 
X-i, . . . , X\- m exactly when, for each eigenvalue A of A, the maximum 
magnitude r(A) of the roots X of the polynomial 

P{X,X) =X m - p x {\)x m - 1 - p 2 {\)x m - 2 p m {\)x m - m 

with coefficients given by the columns of the tableau 

Pj(X) = C , j - Cl, j\ - C2,jX 2 c k,]X k 

is strictly less than 1. Moreover, there is a bound upon the residuals for 
all y and all initial vectors xo, X-i, . . . , xi_ m 

lkr.ll < ( IM + ||r_i|| + • • • + ||ri_ m || ) Q(n) R n , 
and if A is nonsingular there is an identical bound upon the errors 

||e n ||<(||co|| + ||e_i|| + --- + ||ei_ m ||)Q(n)iT l . 

R is the maximum r(X) for all the eigenvalues of A. Q(n) is a polynomial 
that depends on the norm, on A, and on the coefficient tableau. The norm 
may be any consistent matrix-vector norm. 

Proof. The Theorem is not trivial because it allows defective A, but the proof's 
main challenges are notation and pruning. There are nine parts. 

Part 1. Much of what is needed to prove the Theorem can be taken from the 
literature of finite differences [39]. The proof depends on sequences {t„} generated 
by constant coefficient, homogeneous, m th order, linear recurrence formulas 

m 
T n = y j r'jTn-j. 
3=1 

The sequences begin from initial values To, t_i, . . . , Ti_ m and the formulas apply 
when n > 1. Any recurrence sequence can be expressed as a linear combination of 
fundamental sequences of the form {n 4-1 // 1 } in which p is a root of multiplicity at 
least i of the characteristic polynomial 

m 

x m -j2 p 3 xm ~ j - 

3 = 1 
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The coefficients in the linear combination for {r„} can be obtained by solving an mx 
m system of linear equations. Each column of the coefficient matrix M contains the 
initial values of a different fundamental sequence, and the column on the right side 
of the equations contains the initial values of {t„}. This formula for the coefficients 
leads to the following bound 



; i i 






KtT-^WM-^Yw^ 



in which n > 1 and r is the magnitude of the largest root p. The power n m_1 only 
occurs in the worst case when the characteristic polynomial has a single root of 
multiplicity m. 

The bound is needed for recurrence sequences whose coefficients Pj are polyno- 
mials of a parameter A. In this case, the characteristic polynomial is the P(A, X) in 
the statement of the Theorem. A technical detail occurs when P m vanishes for some 
A, because then the recurrence has order less than m. The bound remains valid 
for a smaller matrix M and a smaller order m, but it is notationally convenient to 
retain the original m. Thus, for specific polynomials Pj(A) there is a bound 

m 

|r„|<n m -V(A)" g (A)El^l 
i=i 

in which r(A) is the magnitude of the largest root of the characteristic polynomial, 
and q(X) is a number with a rather complicated definition. The r(A) and q(X) 
depend on the recurrence coefficients, that is, on the coefficient tableau of the 
operator coefficient method. 

Part 2. The recurrence formulas generate another collection of fundamental 
sequences, here denoted {7r n .fc} for k from 1 to m. These sequences have only one 
nonzero initial value apiece, namely Tt\-k. k — 1 and others 0. When they represent 
any other recurrence sequence in the manner of Part 1 , then the coefficients in the 
linear combination are just the other's initial values. 

m 
T n = / , Tl-fcTTn, k 
fc=l 

In this way a homogeneous sequence has a closed-form representation terms of its 
initial values. 

Part 3. Adding an extra term ag to rg amounts to beginning a new recurrence 
sequence with initial value ag. The contribution to t?+i is 7Ti ] \ag, the contribution 
to Tg + 2 is 7T2, \(Ji, and so on. The sequence of multipliers is {^ n ,i\- One sequence 
suffices independent of £ because the recurrence formulas have constant coefficients. 
In combination with Part 2 therefore, a nonhomogeneous recurrence 

m 
T~n &n i / -*^ jT~n— j 
j=l 

has a closed-form representation 

n m 

Tn = / ^n-l, lty + 2 , n n,kTl-k- 
£=1 fe=l 
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Part 4. When the recurrence coefficients Pi, P<i, . . . , P m are polynomials of A 
as they are here, then so are the 7r nj fc's. Their recurrence formula 

m 
TTn, k — / v j^ n ~j, k 

3=1 

can be differentiated s times 



» 



»=o j=i 



*\ (i) (,-i) 
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to reveal that the derivatives satisfy nonhomogeneous recurrence formulas. The su- 
perscripts in parentheses denote derivatives of various orders with respect to A. The 
derivative sequences have zeroes for initial values, that is, only the nonhomogeneous 
terms participate in Part 3's closed-form expansion. 



n s m 



e=l i=l j=l 



W n,j ~ Z^ Z^ Z^ ( „• J n n-t,l P j W n-j, k 



This represents each derivative entirely in terms of lower-order derivatives. 

Part 5. The bounds of Part 1 can be applied hrst to the polynomials 7r„. & of 
Part 2 and then to the derivative formulas of Part 5 to obtain 



ni%(\)\<q s (n,X)r(\r 
where q s is a polynomial of n. Part 1 forces the choice 

q a (n,\)=n m - l q(\), 
and the others can be constructed recursively by 
q s (n,X) = nq (n, X)q s -i(n, A)m2 



max 

l<i<s. l<j<~m 



pf(A) 



Only the polynomial dependence on n is important. The bounds aren't sharp and 
needn't be. They have been chosen to increase monotonically with n and s to ease 
the following derivation. 



n s m 



<w\ = I EEE ( ■) ^i(W(«(a) 

£__1 i=l j=l 



n s m 



^ EEE(j«o(n,A)|P, W (A) g,_i(n,A) 
i=i i=i j=i 



< nq (n,X)q s - l (n,X)J2J2(j P J ( A ) 

i=l i=l 

< q s (n,X) 
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Part 6. At this point its customary to cite an unimpeachable source for the 
definition and existence of a Jordan decomposition, J = UA U^ 1 . This allows indi- 
vidual Jordan blocks to be considered. Manteuffel [35] [36] observes that evaluating 
a polynomial tt(X) at a Jordan block amounts to differentiating the polynomial. If 
A is the block's eigenvalue, there results an upper triangular, Toeplitz matrix with 
7P*)(A)/i ! on the i th superdiagonal, as in the example below. 



/ 



V 



A 1 


\ 




A 1 




= 


A 


) 





tt(°)(A)/0! 7r«(A)/l! tt( 2 )(A)/2! 

7r(°)(A)/0! 7r«(A)/l! 

7r(°)(A)/0! 



In this way the bounds of Part 5 can be applied to each Jordan block and then 
combined for all blocks to obtain bounds 

\\7r n , k (A)\\ = \\U- 1 7r n ,k(J)U\\<Q(n)R n . 

R is the largest r(A) for all the matrix eigenvalues. Q(n) is a polynomial that 
depends on the norm, on A, and on the coefficient tableau. The norm should be a 
consistent matrix-vector norm because the next step applies it to both. 

Part 7. Remembering the original use of the 7T nj fe's in Part 2 and the bounds in 
Part 6, the terms of vector sequences {t n } generated from initial values to, i_i, . . . , 
ti—m by a constant coefficient, homogeneous, m order, linear recurrence formula 

tn 
t n = / J "j {■"■)t n —j, 
i=l 

whose coefficients are polynomials of A, are expressed by 

m 

tn = 2 j ^n,k{A)ti- k 

and are bound by 



k=\ 



|tn|| < Q(n)R n J2\\^-k\ 



fc=i 



Part 8. The preliminaries are finished and the theorem's proof begins here. 
Since the operator coefficient method is homogeneous there is a residual recurrence 






and if A is nonsingular this can be multiplied by A 1 to obtain an error recurrence 



m 



■n—j 



and from Part 7 both sequences satisfy the bounds in the statement of the Theorem. 
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Part 9. If A is singular then the Theorem's polynomial for A = is 

(0,A) = A -C ,lA -C 0j2 A C . m X 

which has 1 as a root because co,i + co,2 + • ■ • + co,m = in the homogeneous case, 
so 1 = r(0) < i?. Thus, if i? < 1 then is not an eigenvalue, A is nonsingular, and 
a solution exists. The bound in Part 8 shows the errors converge to zero since 

limQ(n)i?™ = 

n 

whenever Q is a polynomial of n. The Theorem's convergence criterion is therefore 
sufficient. If 1 < R, then from the Jordan block of the eigenvalue for which the 
Theorem's polynomial has a root of magnitude R, it is possible to construct some y 
and initial vectors Xo, X-i, ■ ■ ■ , xi- m so the corresponding residuals satisfy ||r„|| = 
||ro||i? n . The Theorem's convergence criterion therefore is necessary. | 
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Appendix 2. Figure Explanations 

This appendix explains the numerical experiments reported in the Figures. All 
calculations are performed by a Cray XMP with unit roundoff 3.5 x 1CP 15 . The 
initial guess xq for all iterations is 0. 

Figure 1. 2-norm relative errors in the computed basis vectors of the 
conjugate gradient algorithm for a system of order 100. 

Figure 2. Relative A-norm solution errors for the system of Figure 1. 
The upper curve is for single precision and the lower for reorthogonalized 
double precision. 

For Figures 1 and 2 the system Ax — y has diagonal A with entries l 2 , 2 2 , 
.... 100 2 and uniform y with entries 1, 1, . . . , 1. The single precision conjugate 
gradient algorithm appears in Section 2 and has explicit, not recursive, residuals. 
Its deviation from what would be obtained from exact, infinite precision calcula- 
tions is measured by comparison with a double precision version that includes full 
orthogonalization in which 



n * A 

1 Pn—j s±Pn—j 



Pn-r n ^ Vn-3 



replaces 



p n -i*Ar r , 
p n -i*Ap n - 



■Pn-l- 



Figure 1 actually plots \\ph, —ph Ih/lbn lb and Figure 2 plots both ||ek IU/IWU 
and ||e„ ||.A/||x*m. The superscripts distinguish single and double precision values. 

Figure 3. 2-norm relative residuals for homogeneous (dashed) and inho- 
mogcneous (solid) gcr(k—l)/gmres(k), k — I, 2, ... , 10, applied to one 
system. The two methods perform alike except for k = 5 when the origi- 
nal, homogeneous method stagnates and the new, inhomogeneous method 
converges. 

For Figures 3, 4, 5 and 6 the system Ax = y resembles one used by Elman and 
Streit [15]. The matrix is a preconditioned discretization of 

d 2 d d \ 



dxdy dx dy 

for real-valued u on [0, 1] x [0, 1] with zero Dirichlct boundary data. The finite dif- 
ference discretization has a 33 x 33 uniform grid with 961 interior unknowns related 
by 5-point approximations to second derivatives and by centered approximations to 
first derivatives. The preconditioned system is A^ A\x — y where A\ is the discrete 
operator for a — 50, j3 = 100, 7 = 250 and where A2 is the discrete operator for 
a — ft = 7 = 0. A stabilized block cyclic reduction method performs the multi- 
plication by A2 [26 ]. The entries of y are uniformly and randomly distributed 
between —1 and 1. 
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Gcr(fc— l)/gmres(fc) is implemented in the simplest possible manner suggested 
by Table 1. The singular value decomposition solves the least squares problem 

minimize ||r n +i|| over x„+span{r n Ar n A 2 r n ... A 1 r n j 

by projecting r n into span {Ar n A 2 r n ... A k r n \. The inhomogeneous version 
solves 

minimize ||r n +i|| over span |a; n r n Ar n A 2 r n ... A k ~ x r n j 

by projecting y into span {Ax n Ar n A 2 r n . . . A k r n j . 

Figure 4. 2-norm relative residuals for homogeneous (dashed) gcr(5)/ 
gmrcs(6) and inhomogeneous (solid) oc(6,m), m= 1 2 ... 10, applied to 
the same system. 

The system of equations and the implementation of homogeneous gcr(fc— 1)/ 
gmres(fc) are described with Figure 3. The implementation of the inhomogeneous 
oc (k, to) methods is described in Section 6b. When m = 1 it is identical to Figure 3's 
inhomogeneous gcr(fc— l)/gmres(fc). 

Figure 5. Level curves of observed convergence rate for inhomogeneous 
oc (fc, to) as a function ofk and m for one system. The curves range in mul- 
tiplicative steps of 10 005 from 10 _1 at the upper right to 10°, no conver- 
gence, at the lower left. The left edge corresponds to gcr (fc— 1) /gmres(k) , 
the bottom edge to orthomin(m—l) . Figure 6 displays the convergence 
histories. Figure 6 displays the convergence histories. 

Figure 6. Convergence histories from which the level curves of Figure 5 
are derived. 

The system of equations is the one for Figures 3 and 4. Section 6b describes 
the implementation of the inhomogeneous oc (k, to) methods. Least squares linear 
fits to the last half of each curve in Figure 6's logarithmic scale produce the ob- 
served convergence rates. The level curves are obtained by extending the fits to the 
logarithmic data bilinearly throughout the cells of the k x to grid. 

Figure 7. Coefficients of inhomogeneous oc (2,2) minimizing the 2-norm 
of the residual for one system. 

Figure 8. Eigenvalues of the matrix (solid dots) superimposed on some 
level curves of the convergence rate r(A) for the constant coefficient regime 
of Figure 7. The levels start at 1 on the boundary and decrease in steps 
of 0.05. 

Figure 9. 2-norm relative residuals for the iteration of Figure 7 (lower 
solid line), and for a constant coefficient iteration with the same right hand 
side (higher solid line) and a different right hand side (dashed line). 

For Figures 7, 8 and 9 the matrix is Toeplitz and banded with — l's on the 
first supcrdiagonal and with l's on the main and first 3 subdiagonals. Careless 
programming resulted in a matrix of order 201. For Figure 7 the entries of the 
right side all equal 1. The implementation of the inhomogeneous oc (2, 2) method is 
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described in Section 6b. Figure 7 plots the 6 coefficients in the 3x2 tableaux. The 
coefficients for the hrst few tableaux are not shown because they are larger than 
the others. The coefficients for the next several iterations vary on the order of one 
percent from 

1.421 -0.421 

0.261 -0.172 

-0.130 0.102 

which are the constant coefficients used for Figures 8 and 9. These Figures are 
explained by their captions and by Section 8. The other right hand side for Figure 9 
has entries uniformly and randomly distributed between —1 and 1. 

Figure 10. Level curves of convergence rate r(A) as a function of A in 
the complex plane for Table 4's constant coefficients. The levels begin at 
1 on the boundaries and decrease in steps of 0.05. 

Like Figure 8, the curves are formed by plotting the contours of 

r(A) = maximum \X\ of all X for which P(\, X) = 

using the quadratic formula to solve P(\, X) = when m = 2. 
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